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ABSTRACT 

Whether a correlation exists between the radio and gamma-ray flux densities of blazars is a long- 
standing question, and one that is difScult to answer confidently because of various observational biases 
which may either dilute or apparently enhance any intrinsic correlation between radio and gamma- 
ray luminosities. We introduce a novel method of data randomization to evaluate quantitatively the 
effect of these biases and to assess the intrinsic significance of an apparent correlation between radio 
and gamma-ray flux densities of blazars. The novelty of the method lies in a combination of data 
randomization in luminosity space (to ensure that the randomized data are intrinsically, and not just 
apparently, uncorrelated) and signiflcance assessment in flux space (to explicitly avoid Malmquist 
bias and automatically account for the limited dynamical range in both frequencies). The method 
is applicable even to small samples that are not selected with strict statistical criteria. For larger 
samples we describe a variation of the method in which the sample is split in redshift bins, and the 
randomization is applied in each bin individually; this variation is designed to yield the equivalent 
to luminosity-function sampling of the underlying population in the limit of very large, statistically 
complete samples. We show that for a smaller number of redshift bins, the method yields a worse 
significance, and in this way it is conservative: although it may fail to confirm an existing intrinsic 
correlation in a small sample that cannot be split into many redshift bins, it will not assign a stronger, 
artificially enhanced significance. We demonstrate how our test performs as a function of number of 
sources, strength of correlation, and number of redshift bins used, and we show that while our test is 
robust against common-distance biases and associated false positives for uncorrelated data, it retains 
the power of other methods in rejecting the null hypothesis of no correlation for correlated data. 
Subject headings: galaxies: active - gamma rays: galaxies - radio continuum: galaxies - methods: 
statistical 



1. INTRODUCTION 

Whether the radio and the gamma-ray luminosities of 
blazars are intrinsically correlated is a long-standing de- 
bate. The presence or absence of such a correlation could 
provide insight into blazar emission physics. At radio 
frequencies low enough that synchrotron emission is self- 
absorbed on physical scales likely to be associated with 
gamma-ray emission, measurements of the gamma-ray 
and radio flux densities typically probe different parts of 

^ Einstein Fellow 

^ current address: Max-Planck-Institut fiir Radioastronomie, 
Bonn 53121, Germany 

^ current address: Department of Physics, Purdue University, 
525 Northwestern Ave, West Lafayette, Indiana 47907 



the blazar jet. If concurrently-measured, time-averaged 
flux densities at self-absorbed radio frequencies and high- 
energy (> 100 MeV) gamma-rays are intrinsically corre- 
lated, the implication would be that emission and flaring 
in different parts of blazar jets are driven by the same 
disturbances. In this case, further progress on the se- 
quence of events that produce blazar flares can be made 
through high-cadence monitoring in both wavebands. If 
on the other hand radio and gamma-ray flux densities 
can be shown to be uncorrelated (a statement that needs 
to be carefully distinguished from the absence of evi- 
dence for correlation) then it is more likely that, over the 
timescales used for the flux-averaging, emission regions 
probed by radio and gamma-ray observations evolve and 
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radiate independently. Furthermore, should an intrin- 
sic correlation between gamma-ray and radio flux den- 
sities be unambiguously demonstrated, radio blazar lu- 
minosity functions could be used to establish the shape 
and normalization of gamma-ray luminosity functions or 
log N — log S distributions (however, proper care should 
be exercised to account for any significant scatter in the 
correlation, see, e.g., the discussion in Ackermann et 
al. 2011). From there, the unresolved blazar contribu- 
tion to the diffuse gamma-ray background could be esti- 
mated (e.g., Stecker & Salamon 1996; Kazanas & Perl- 
man 1997; Stecker & Venters 2010). This is particularly 
important as blazars constitute a guaranteed background 
for any search in the diffuse gamma-ray emission for yet- 
undetected classes of sources such as galaxy clusters, and 
for signatures of exotic physics. 

Strong correlations between radio and gamma-ray lu- 
minosities have been claimed based on EGRET data 
(e.g., Stecker et al. 1993; Padovani et al. 1993; Stecker 
& Salamon 1996). However, these findings have been 
disputed (e.g., Miicke et al. 1997; Chiang & Mukherjee 
1998) based on more detailed statistical analyses. The 
objections against the claimed correlations can be sum- 
marized as follows. 

First, artificial flux- flux correlations can be induced 
due to the effect of a common distance modulation of 
gamma-ray and radio luminosities. Feigelson & Berg 
(1983) have argued that in statistically complete surveys 
of relatively small depth, apparent flux-flux correlations 
do not appear unless the corresponding luminosities are 
intrinsically correlated: if luminosities are intrinsically 
uncorrelated most objects will only have an upper limit 
rather than a detection in one of the wavebands. How- 
ever this is not the case in samples that are selected with 
complex or subjective criteria, samples in which there 
is clustering around a preferred luminosity value, sam- 
ples in which detection in both wavebands is one of the 
selection criteria, or samples in which the luminosity dy- 
namical range is, for any reason, small compared to the 
distance modulation range. In such cases, the applica- 
tion of a common distance-squared factor to both radio 
and gamma-ray luminosity will automatically induce an 
artificial flux-flux correlation. 

This effect cannot be avoided simply by searching for 
correlation in luminosity space, as the danger of inducing 
an artificial apparent correlation is even greater in this 
case due to Malmquist bias: in fiux-limited (or approx- 
imately flux-limited) surveys, most objects are concen- 
trated close to the survey sensitivity at each wavelength. 
By modulating these limiting fluxes by a common dis- 
tance factor to return to luminosity space, artificial cor- 
relations arise. 

Finally, the data used to obtain the claimed correla- 
tions were not synchronous. The direction in which non- 
simultaneity affects any intrinsic correlation is unclear. 
On the one hand, non-simultaneous data may wash out 
an intrinsic correlation which might otherwise be found 
in concurrently measured data. On the other hand, the 
tendency to detect more flaring objects than objects in 
a quiescent state in surveys may lead to enhanced cor- 
relations, essentially representing peak flux / peak flux 
correlations of different flares, which although they may 
be indicative of the overall energetics of flares in a sin- 
gle object, they do not convey any detailed information 



regarding the time-averaged behavior of the object. In 
the Fermi era, the possibility of a correlation between 
gamma-ray and radio fluxes of blazars has generated a 
lot of interest, and the question has been explored us- 
ing Fermi Large Area Telescope (LAT) fluxes in combi- 
nation with archival (Ghirlanda et al. 2010, Mahony et 
al. 2010, Giroletti et al. 2010; Ackermann et al. 2011), 
quasi-concurrent (Kovalev et al. 2009) and concurrent 
(Giroletti et al. 2010; Ackermann et al. 2011, Angelakis 
et al. 2010; Fuhrmann et al. 2012, in preparation) radio 
data. 

The intrinsic signiflcance of an apparent correlation 
between radio and gamma-ray flux densities in strictly 
flux-limited, large datasets is relatively straight-forward 
to assess, by Monte-Carlo draws from the underlying lu- 
minosity functions in both datasets, obeying the same 
selection criteria as the observed sample of sources (e.g.. 
Bloom 2008). In practice however we frequently en- 
counter the case where a sample of monitored sources has 
been selected to optimize the likelihood of high-impact 
observations in individual objects using complex and of- 
ten subjective criteria, which are difficult to reproduce 
in a simulation. Although such samples are not ide- 
ally configured for unbiased population studies, they may 
present significant advantages in other respects, such as 
multi-band coverage, high cadence of observations, and 
simultaneity between different waveband data. It is thus 
important to be able to assess as robustly as possible 
the intrinsic significance of any apparent correlations ob- 
served in such samples. Here, we introduce a method for 
the quantitative assessment of the significance of a corre- 
lation in such cases, based on permutations of observed 
flux densities, while ensuring that the dynamical ranges 
in luminosity and flux density are kept fixed. When this 
method is applied in large, statistically complete sam- 
ples that are split in redshift bins, it asymptotically ap- 
proaches luminosity-function sampling. For smaller sam- 
ples, the significances it returns are conservative: exist- 
ing intrinsic correlations may not be verified, but exag- 
gerated significances are avoided. Our method has been 
recently used by the Fermi-LAT collaboration (Acker- 
mann et al. 2011) to study the correlation between GeV 
and cm radio fiuxes (both archival and concurrent, the 
latter from the Owens Valley Radio Observatory 15 GHz 
monitoring program, Richards et al. 2011^). They have 
established, at a very high significance level, the exis- 
tence of a positive correlation (< 10^^ probability of the 
correlation arising by chance). Our method is also cur- 
rently used in studies of multi-frequency concurrent radio 
observations by the F-GAMMA program (Angelakis et 
al. 2010; Fuhrmann et al. 2012, in preparation). Here, 
we discuss in detail the method and its implementation, 
and we evaluate its performance using both simulated 
and real {Fermi and OVRO) data. 

We caution the reader that our proposed algorithm 
assumes perfectly concurrent data and thus does not ad- 
dress any possible effects of non-simultaneity. In addi- 
tion, we stress that our method cannot compensate for 
sample selection effects or incompleteness relative to a 
parent population. For example, if the objects in the ex- 
amined sample do not constitute a representative sample 
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of the blazar population, even when a statistically signif- 
icant correlation between radio and gamma-ray flux den- 
sities can be established in the objects of the observed 
sample, it is not possible to generalize this result to the 
blazar population as a whole. This limitation can only 
be addressed by more careful sample selection. 

This paper is organized as follows. In §2 wc discuss our 
method, and in §3 wc present in detail the implementa- 
tion of the statistical test we have adopted. Demonstra- 
tions of the test and evaluations of its performance are 
presented in §4. We summarize and discuss our conclu- 
sions in §5. 

2. METHOD 
2.1. Small, subjectively selected samples 

The purpose of the test is to quantitatively assess the 
significance of an apparent correlation between concur- 
rent radio and gamma-ray flux densities of blazars in 
the presence of distance effects and subjective sample 
selection criteria. We will do so by testing the hypoth- 
esis that emission in the two wavebands is intrinsically 
uncorrclatcd: we will calculate how frequently a sam- 
ple of objects similar to the sample at hand, with in- 
trinsically uncorrelated gamma/radio luminosities, will 
yield an apparent correlation as strong as the one seen 
in the data, when subjected to the same distance and 
dynamical-range effects as our actual sample. 

In our implementation of the test, the strength of 
the apparent correlation is quantified by the Pearson 
product-moment correlation coefficient r (Fisher 1944), 
defined as 

with {Xi,Yi) in our case being a pair of the logarithms 
of the flux densities in each frequency for a single object. 
The reason for taking the logarithm is two-fold. First, it 
ensures that, for sources with a power-law distribution 
of fluxes, there will not be a clustering of most measure- 
ments around the low-flux corner of the flux-flux plane, 
which would then allow single high-flux outliers to in- 
duce an artificially high r value. Second, it linearizes 
any power-law relation between the variables, which im- 
proves the behavior of correlation measures that target 
specifically the linear correlation between variables (such 
as the Pearson r). 

This test can also be used with any statistic quantify- 
ing correlation strength instead of the Pearson product- 
moment coefficient, including non-parametric correlation 
measures (e.g., Siegel & Castellan 1988; Conover 1999). 

Since the sample selection criteria are assumed to be 
subjective, the challenge in defining our test lies in con- 
structing simulated object samples of intrinsically un- 
correlated gamma/radio flux densities, similar in other 
respects to our actual object sample. In order to over- 
come this difficulty, we use only permutations of mea- 
sured quantities. 

Our method is a variation of a classical permutation 
test for the assessment of a correlation (e.g.. Wall & 
Jenkins 2003, §4.2.3; see also Efron & Petrosian 1998 
for permutation methods for doubly truncated datasets) . 
Its novelty lies in the fact that while we are trying to es- 
tablish a correlation between flux densities and calculate 



a distribution of correlation measures in simulated sets 
of flux density logarithms, we perform permutations in 
luminosity space (see also Fender & Hendry 2000 for a 
similar Monte Carlo approach of evaluating an apparent 
distance-squared effect and the possible effect of Doppler 
beaming in the case of radio data of persistent X-ray bi- 
naries) . In this way, we can simulate the effect of a com- 
mon distance on intrinsically uncorrelated luminosities, 
by applying a common redshift to permuted luminosity 
pairs to return to flux space. By assessing the significance 
in flux space we avoid Malmquist bias, and we automat- 
ically account for the limited flux dynamical ranges in 
the two frequencies under consideration. 
We do so as follows: 

1. Prom the measured radio and gamma-ray flux den- 
sities, we calculate radio and gamma-ray luminosi- 
ties at a common rest-frame radio frequency and 
rest-frame gamma-ray energy. 

2. We permute the evaluated luminosities, to sim- 
ulate objects with intrinsically uncorrelated ra- 
dio/gamma luminosities. 

3. We assign a common redshift (one of the redshifts 
of the objects in our sample, randomly selected) 
to each luminosity pair, and return to flux-density 
space. Assigning a common redshift allows us to 
simulate the common-distance effect on uncorre- 
lated luminosities. Using measured redshifts and 
luminosities guarantees that the distance and lumi- 
nosity dynamical range in our simulated samples is 
also identical to that of our actual sample. 

4. To avoid apparent correlations induced by a sin- 
gle very bright or very faint object much brighter 
or fainter than the objects in our actual sample, 
we reject any flux-density pairs where one of the 
flux densities is outside the flux-density dynamical 
range in our original sample. 

Using a randomly selected set of flux density pairs, 
with number equal to the number of objects in our actual 
sample, we calculate a value for r. We repeat the pro- 
cess a large number of times, and calculate a distribution 
of ) — values for intrinsically uncorrelated flux densities. 
The fraction of |r| > r^ata, where rjata is the r— value 
for the observed flux densities, is the probability to have 
obtained an apparent correlation at least as strong as 
the one seen in the data from a sample with intrinsically 
uncorrelated gamma-ray /radio emission. This quantifies 
the statistical significance of the observed correlation. 

Formally, the null hypothesis tested with this proce- 
dure is Hq : The radio and gamma-ray luminosity of 
blazars are independent, and redshift is independent of 
both luminosities. We note that in many cases, this is 
not the hypothesis we would like to be testing, as lumi- 
nosities depend on redshift in most population models of 
active galactic nuclei. Ideally, we woiild like to test for 
independence between radio and gamma-ray luminosi- 
ties conditioned on redshift. However, this is not always 
practically possible due to sample size and redshift span 
of the sources. For the cases when the sample size is large 
enough and the sources included in the sample are ad- 
equately spread over redshifts, the test discussed in the 
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next subsection will fulfill this requirement. For cases 
however when sample limitations are prohibitive for such 
a study, we show that testing Hq with the implementa- 
tion presented in this work can provide a conservative 
alternative to the full problem; if Hq is rejected with 
high significance, then it is safe to assume that radio and 
gamma-ray luminosities are also not independent condi- 
tioned on redshift. However, if Hq cannot be rejected, no 
conclusion can be reached for either hypothesis, as ab- 
sence of evidence for a correlation is not equivalent with 
evidence for absence of a correlation. 

2.2. Larger samples: splitting the sample in redshift bins 

The process of pair rejection discussed in step 4 above 
may alter the distribution of luminosities, fiuxes, and 
redshifts of the randomized data and introduce substan- 
tial differences from the corresponding distributions of 
the original dataset. 

The cause of this effect is the randomization of red- 
shifts among all sources, and it is straight-forward to 
understand. Low-luminosity nearby objects, when com- 
bined with large redshifts, will result in very faint fluxes 
which are outside the original flux dynamical range and 
thus rejected. For this reason, the simulated datasets 
will have fewer very-low-luminosity objects compared to 
the original dataset. In addition, rare, high-luminosity, 
high-redshift objects, when combined with low redshifts, 
will result in very high fluxes, also outside the original 
flux dynamical range and thus rejected. For this reason, 
the simulated datasets will also have fewer very-high- 
luminosity objects compared to the original dataset. In 
contrast, the number of intermediate-luminosity objects 
will be relatively enhanced in simulated datasets. The 
distributions of redshifts and fluxes of the simulated 
datasets will also be altered for similar reasons. 

If the pair rejection rate is high, the properties of the 
simulated datasets could be different than the proper- 
ties of the original dataset, and these biases could affect 
our estimation of a correlation signiflcance. In small and 
subjectively selected datasets, this problem is a neces- 
sary evil. The effect of these biases is, as we will show 
below, to worsen the estimated significance of a correla- 
tion, rather than induce false positives of enhanced sig- 
nificance. However, in the case of larger samples, there is 
a simple alteration in the methodology described in ^ 32. II 
that can significantly alleviate these biases: splitting the 
sample in redshift bins. 

In this variation of the test, the original sample is split 
into a number of bins dependent on the available number 
of objects (as we discuss below, we need about 10 ob- 
jects or more per bin, and in any case no fewer than 8). 
We then generate randomized flux-density pairs in each 
redshift bin with the process described above. Because 
the range of redshifts that are permuted between objects 
of different luminosities is much smaller, the likelihood 
that one of the resulting randomized flux densities will 
exceed the flux-density dynamical range of the original 
dataset is much smaller. As a result, the pair rejection 
rate is decreased, and the luminosity, redshift, and flux 
distributions of the randomized data pairs resemble more 
closely those of the original dataset. 

The similarity between distributions of the random- 
ized and the original data increases as the size of the 
sample increases and the width of each redshift bin de- 



creases. If the original dataset is also a statistically com- 
plete and flux-limited sample, then the test asymptot- 
ically approaches the luminosity-function-sampling test 
as the size of the original dataset approaches inflnity and 
the size of each redshift bin used approaches zero. This 
can be understood as follows. In the limit of zero-size 
redshift bin, all objects within a single redshift bin are 
at the same distance. Therefore, permuting the luminosi- 
ties of objects at that distance is equivalent to forming 
luminosity pairs by randomly sampling each frequency's 
luminosity function at a speciflc redshift and with a spe- 
ciflc flux-density limit (the limit of the original sample). 
Repeating the process at all redshift bins is equivalent to 
sampling the luminosity functions at all redshifts. Then, 
the "pool" of randomized data pairs, from which we draw 
the mock datasets, could have been equivalently gener- 
ated through luminosity function sampling. 

Formally, the null hypothesis tested with this pro- 
cedure is Hq : Conditional on redshift, the radio and 
gamma-ray luminosity of blazars are independent, which 
is the hypothesis that one would generally wish to test. 
For this reason, this version of the test should be pre- 
ferred whenever possible. 

3. IMPLEMENTATION 

In this section we describe how the method discussed 
above can be implemented in practice for small and large 
datasets. 

3.1. Small, subjectively selected samples 

The flrst step is to convert the blazar gamma-ray fluxes 
(which are usually reported as integrated photon fluxes 
F above some flducial energy Eq, usually 100 MeV), to 
energy flux densities, so that the comparison with radio 
flux densities can be done on an equal footing^. We do 
so by assuming that the photon fluxes are power laws, 
so that the flux (number of photons per unit area-time- 
energy bin) is 



dNpi,oton _ f E 
dEdAdt" ° I Eq 
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In this case, the gamma-ray energy fiux density S-y — 
dE/dEdAdt at Eq is given by S.y{Eo) = FqEq = F(r - 
1) and its energy dependence is 



^^(£;) = (r~i)F 
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The relation between monochromatic fiux density S{v) 
and monochromatic luminosity L{i') for a source at red- 
shift z is 



where d — (c/Hq) dz/y/n\ ~\- flm{^ + z)^- Here Hq 
is the present-day value of the Hubble parameter, and 
Q\ and fl„i are the vacuum energy and matter density 
parameters. In this work, we have used fim = 0.26 and 
r^A = 1 — ri,„, consistent with, e.g., Larson et al. (2011). 

^ Other possible choices is to correlate radio flux densities with 
gamma-ray photon fluxes at some particular energy bin, or with 
the integrated photon fluxes themselves (see, for example, Abdo et 
al. 2011). In these cases, Eq. [7]should be changed accordingly. 
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Note that the value of Hq drops out of the calculation 
as d in the formalism we describe below appears only 
in ratios. If the source has a spectral index a so that 
S{i') oc at the frequency of interest, Eq. Q implies 
that the relation between S'(j^) at observer-frame and 
at rest- frame v (the K-correction) is 

L(iy) = 5(i^)47rd2(l + z)i-". (5) 

So if a radio flux density Sr{v) (at observer-frame v) is 
turned into a luminosity density (at rest-frame u) using 
a redshift zi and a spectral index a^, and this lumi- 
nosity density is then returned to flux-density-space (at 
observer-frame using a different redshift z' but the 
same spectral index a^, we can write 

where c?i — d{zi) and d' = d{z'). For the same procedure 
with gamma-ray flux densities and a source at a redshift 
Z2 we can write 

s;,..,.(r-i,.(|y(l±f)\ ,7, 

In practice, we perform the following steps. 

(i) For each blazar, we use the flux density in radio 
and gamma-ray frequency to produce monochro- 
matic luminosities at the same (now rest-frame) 
frequency in the two bands. 

(ii) We construct all possible pairings (excluding the 
original ones) of radio and gamma-ray luminosities 
from our observed sample. 

(iii) We assign a common redshift z' to each permuted 
pair (one of the available redshifts in our sample). 

(iv) We calculate "mock" radio and gamma-ray flux 
densities S'^, S!y for each pair using Eq. ([5])^. 

(v) We accept the pair if both flux densities are within 
our original flrrx-density dynamical range in each 
band, or reject it otherwise. 

(vi) We randomly select N pairs out of all the possi- 
ble combinations, where N is equal to the number 
of our original observations. Each set of N pairs 
is now a simulated dataset of intrinsically uncorre- 
lated flux/flux observations. 

(vii) For each simulated dataset, we compute r using 
Eq. (P), where X, = log(5; J and Y, = log(5; J, 
with i running from 1 to N. 

(viii) We repeat steps (vi-vii) m times, where m is a 
sufficiently large number to sample the underlying 
|r| distribution. In our tests below m is between 
10^ - 10^. 

(ix) We calculate the probability for the observed |r| to 
have occurred through uncorrelated flux densities 
from the |r|— values obtained in step (viii). 

® Equivalently, we can use directly Eqs. ||6} and Q, without 
explicitly calculating luminosities first. 



Our technique can be applied to samples that are very 
small and still yield a reliable estimate of the distribution 
of \r\. The total number of simulated pairs that we can 
construct through our permutation technique from N ob- 
jects is iVpairs = N'^{N — 1) (where we permute both flux 
densities as well as redshifts) . Only a fraction A'surv will 
survive the low- and high- flux-density cuts that ensure 
that the flux-density dynamical range remains the same 
as in the original sample. Assuming a reduction no larger 
than a factor of 5 (i.e. iVgurv ^ -^pairs/5, shown in prac- 
tice to be a conservative assumption), the total number 
of combinations of N pairings different from each other 
by one or more pairs out of a population of iVgurv objects 
then is 

N I 

pair combinations = ^.(^J^l , (8) 

which is ^ 10^ for samples with N > 8. However, 
in small datasets a statistically significant correlation is 
harder to establish, even if the distribution of |r| can be 
estimated with sufhcient statistics. In addition, as we 
will also show in 21 the biases in the luminosity, red- 
shift, and flux distributions of the simulated datasets in- 
troduced due to pair rejections (see discussion in 
tend to worsen the significance that can be established 
through this test. 

3.2. Splitting larger samples in redshift bins 

Whenever the size of the source sample is large enough 
to allow splitting in more than one redshift bins, this 
variation of the test is recommended, as the effect of 
biases introduced through pair rejection decreases with 
increasing number of redshift bins (decreasing redshift 
bin size). 

To implement this variation of the test, we split the 
sample in Nz redshift bins. Our choice for the test im- 
plementation is to use variable redshift bin size, selected 
in such a way that the number of sources in each bin is as 
close to equal as possible, but never fewer than 8. How- 
ever, other choices are also possible (for example, keeping 
the redshift bin size approximately equal; or splitting by 
luminosity distance rather than redshift, and keeping the 
luminosity distance bin size approximately equal) . 

For the sources in each one of the bins, we apply 
steps (i)-(v) of ^ 33. II We then combine all accepted sim- 
ulated data pairs from all redshift bins to generate the 
"pool" of all possible pair combinations. Finally, we ap- 
ply steps (vi) - (ix) to this combined randomized pair 
"pool". 

4. DEMONSTRATIONS OF THE TEST 

In this section we present example applications of our 
tests, using both real and simulated data, to evaluate the 
performance of our proposed test and demonstrate sev- 
eral aspects of its implementation. For the applications 
on real data, we will use gamma-ray flux measurements 
from Fermi LAT and radio flux-density measurements 
from the OVRO 40 M Monitoring Program (Richards et 
al. 2011). In addition, we will use simulated data to eval- 
uate the the performance of the method: its effectiveness 
in rejecting false positives due to common-distance biases 
in correlation assessments, and its power in establishing 
significant correlations when such correlations do exist. 
As a benchmark we will use the face-value estimate of 



6 



the significance for the Pearson correlation coefficient r, 
which evaluates the probability of a certain (or bigger) 
value of r to occur by chance in the "dart-throwing" sce- 
nario (i.e., when pairs are randomly drawn from uncor- 
related Gaussian distributions, assuming that no biases 
exist). In the latter scenario, the significance only de- 
pends on the value of r and the sample size N . In the 
null hypothesis (uncorrelated data), the variable 



t 



(9) 



follows a Student's t-distribution with N — 2 degrees of 
freedom. Using Eq. ^ significances (p-values) can be 
estimated for any given values of r and N by taking the 
two-tail integral of the appropriate t-distribution. In gen- 
eral, the variation of the test with redshift binning is the 
one which we recommend whenever possible (whenever 
sample restrictions allow its use) , and it is the one which 
we have used in our simulated datasets. 

4.1. Demonstrations on real data 

4.1.1. Small sample, no redshift bin splitting 

As an example of a relatively small dataset, we use 
the set of blazars that are included both in the LAT 
bright AGN source list (Abdo et al. 2009, produced us- 
ing three months of LAT observations), as well as in 
the "complete sample" of the OVRO 40 M Monitoring 
Program (Richards et al. 2011). The latter consists of 
the 1158 sources north of —20° declination in the Can- 
didate Gamma-Ray Blazar Survey (CGRaBS) sample, 
which is a sample of 1625 sources, mostly blazars, se- 
lected by their flux and spectral index in radio, and flux 
in X-rays, to resemble the blazars detected by EGRET 
(Healey et al. 2008). The 1158 of the "complete sam- 
ple" are observed approximately twice a week at 15 GHz 
with the Owens Valley Radio Observatory (OVRO) 40 
M Telescope. For this study, we only use sources with 
known redshifts, and for which a sufficient number of 
high-quality 15 GHz observations were taken in the same 
three-month time interval of LAT observations so as to 
produce a meaningful concurrent 15 GHz average fiux 
density (see Richards et al. 2011). This sample contains 
38 sources. 

Figure [T] shows 3-month averaged 15 GHz flux densi- 
ties plotted against 100 MeV observer-frame flux densi- 
ties obtained by integration over the same time interval 
for the 38 blazars in our sample. The error bars in this 
plot are substantially smaller than the scatter of points 
(see, e.g., Ackermann et al. 2011) and have been omitted 
for clarity. An apparent correlation between the radio 
and gamma-ray time-averaged flux densities is obvious, 
however the statistical signiflcance of an intrinsic correla- 
tion between the radio and gamma-ray emission of these 
objects needs to be quantitatively assessed. To this end, 
we apply the data randomization analysis we have intro- 
duced in ^13. II The probability distribution of the values 
of |r| in our simulated samples with intrinsically uncor- 
related radio/gamma luminosities is shown in Fig. [2l 
The vertical arrow in this figure indicates the r— value 
for the observed data, equal to 0.62. From the 38 ob- 
jects in our sample a total number of 38^ x 37 = 53, 428 
permuted pairs were generated. Of those, 13, 003 pairs 
had both gamma-ray and radio flux densities within the 
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Fig. 1. — 3- month averaged concurrent 15 GHz versus 100 MeV 
observer-frame flux densities for the 38 blazars in our sample. 
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r (Pearson product-moment correlation coefficient) 

Fig. 2. — Distribution of |r|— values for 38 blazars of the same 
dynamical range in redshift and radio and gamma-ray flux densi- 
ties and luminosities as blazars in our sample. The vertical arrow 
indicates the r— value for the actual observations (r = 0.62). The 
significance of the correlation is 1.5 X 10~*. 



dynamical range of the original dataset. The accepted 
pairs were used (in 10^ randomly drawn sets of 38) to 
generate the distribution shown in Fig. [51 The proba- 
bility to obtain |r| > 0.62 from intrinsically uncorrelated 
flux-density measurements due to the effect of a common 
distance is 1.5 x 10~^. For comparison, the significance 
estimate ignoring any biases and using only Eq. ^ is 
3.3 X 10 without a careful analysis, we would evalu- 
ate the observed correlation as more significant than we 
do when accounting for common-distance and flux biases, 
as these effects are likely to contribute at least part of 
the observed correlation strength. We will elaborate on 
the origin and quantitative behavior of this discrepancy 
in the following sections. 
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Fig. 3. — 11-month averaged coneurrent 15 GHz versus 100 
MeV observer-frame flux densities for the 160 blazars in the larger 
sample. 



Note that the pair rejection rate is high - only 24% of 
the permuted pairs are within the original flux density 
dynamical range and were accepted; biases introduced 
in the luminosity, flux, and redshift distributions of the 
simulated data are therefore a concern. However, as we 
will show below, were these biases absent, the significance 
of the correlation would improve. 

4.1.2. Larger sample, behavior of test with increasing 
number of redshift bins 

We now turn to a demonstration of the second varia- 
tion of our test, where the sample is split in redshift bins, 
and we discuss the alleviation of biases induced through 
pair rejection, and the improvement of the correlation 
significance with increasing number of bins. 

To allow splitting in enough redshift bins to adequately 
demonstrate the behavior of the test in the many-bins 
limit we use the significantly larger sample of 160 sources 
that: (a) are included in the first year LAT catalog (Abdo 
et al. 2010); (b) are part of the OVRO 40 M telescope 
monitored sample; (c) have known redshifts. This same 
sample has been examined in detail for intrinsic corre- 
lations between 15 GHz flux density and LAT gamma- 
ray fluxes at various energy ranges by Ackermann et 
al. (2011), using the test discussed here^. 

Figure [3] shows 11-month-averaged 15 GHz flux densi- 
ties plotted against 100 MeV observer-frame flux densi- 
ties obtained by integration over the same time interval 
for the 160 blazars in the sample described above. The 
error bars in this plot are again substantially smaller than 
the scatter of points (see, e.g., Ackermann et al. 2011) 
and have been omitted for clarity. Through visual in- 
spection, this sample also appears to feature an apparent 

Here, we use for the gamma-ray band data the 100 MeV flux 
density calculated according to Eq. [3]from integrated photon fluxes 
for E > 100 MeV and using the photon index provided in ILAC, 
which is different that any of the flux densities or integrated fluxes 
examined by Ackermann et al. 2011; this is the origin of the small 
differences in the value of r obtained here for the data. 



correlation between radio and gamma-ray flux densities, 
with scatter comparable to that of the smaller sample of 
ij4.1.1l The correlation coefficient of the data in this case 
is r = 0.48. 

The biases introduced through pair rejection in our 
first variation of the test (where the sample is not 
split in redshift bins) are demonstrated in Figs. H]- 
[6l The luminosities in these figures are in units of 
4:Tt{c/ Hq)'^ Sq, where Hq is the Hubble parameter, and 
5*0 = 1 Jy for 15 GHz source-frame luminosities and 
5*0 = 10-**GeV/s- cm2 - GeV for 100 MeV source- 
frame luminosities. 

Figure |4] shows the fraction of objects in each logarith- 
mic radio luminosity bin for the data (thick black line) 
and the accepted scrambled pairs (thin lines). Differ- 
ent line colors correspond to different numbers of red- 
shift bins, as in the figure legend. When only one red- 
shift bin is used (thin black line, equivalent to the first 
variation of our test), the shape of the luminosity dis- 
tribution of the accepted scrambled pairs has a qualita- 
tively different shape than that of the data: objects in 
the bins corresponding to the ~3 lowest orders of mag- 
nitude in luminosity are significantly underrepresented 
compared to the original sample, because these low lu- 
minosities, corresponding to nearby objects in the data, 
are frequently rejected when they are combined with high 
redshifts and produce very low simulated flux densities 
outside the original flux density dynamical range. When 
we split the sample in a larger number of bins the effect 
is alleviated. At 16 redshift bins the radio luminosity 
distribution of simulated data is very close to that of the 
original data, and it is essentially converged, as it does 
not change appreciably when the number of redshift bins 
is increased to 20. 

A very similar behavior for the gamma-ray luminosity 
distribution is shown in Fig. [5l In the case of the red- 
shift distribution, shown in Fig.[6l both the very low and 
the very high redshift bins are underestimated when no 
data splitting is applied (thin black line). However, at 
16 redshift bins the real and simulated data distributions 
are very similar, and the simulated data distribution is, 
again, converged. In all distributions, as the number of 
redshift bins increases, the difference between data and 
simulated distributions decreases, as a result of the de- 
creasing pair rejection rate which, at 16 redshift bins, is 
< 20% for all bins. 

The behavior of the estimated significance as a function 
of the number of redshift bins is shown in Fig. [71 where we 
have plotted the distribution of the absolute values of the 
correlation coefficients |r| for each test implementation. 
Again, different colors correspond to different numbers of 
redshift bins used as in Figs. HHBl 10® simulations were 
used to produce each curve. The r— value for the data is 
shown with the arrow. The significance of the correlation 
as evaluated with 16 redshift bins is < 10~® (if we fit the 
distribution shown with the blue line in Fig. [7] with a 
Gaussian, we obtain a significance of ~ 10~^). 

Again, using Eq. ([9]) to compare with the simple sig- 
nificance estimate based only on r and N , we find that 
t = 6.88, for which the two-tailed t-distribution with 
158 degrees of freedom yields a much stronger signifi- 
cance of 1.3 X 10~^'^. The reason for this substantial 
difference can be immediately understood qualitatively 
through inspection of Figs. |4] -[61 Both the radio and 
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Fig. 4. — Fraction of objects in each logarithm-in radio luminos- 
ity bin for the data (thick black line) and the accepted scrambled 
pairs (thin lines; different colors correspond to different numbers 
of redshift bins, as in legend). The radio luminosities are in units 
of io, radio = 47r(c/Ho)^'S'o where So = Uy. 
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Fig. 5. — Fraction of objects in each logarithm-in gamma- 
ray luminosity bin for the data (thick black line) and the ac- 
cepted scrambled pairs (thin lines; different colors correspond to 
different numbers of redshift bins, as in legend). The gamma- 
ray luminosities are in units of Lo,-y = 4:n(c/ Hq)^ So where So = 
lO-^GeV/s - cm2 - GeV. 
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Fig. 6. — Fraction of objects in each redshift bin for the data 
(thick black line) and the accepted scrambled pairs (thin lines; 
different colors correspond to different numbers of redshift bins, as 
in legend). 
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Fig. 7. — Distribution of |r|— values for randomly selected f60- 
blazar sets picked from the ensemble of accepted pairs generated 
through data scrambling. The vertical arrow indicates the r— value 
(r = 0.48) for the actual observations. Different colors correspond 
to different numbers of redshift bins, as in legend. 



the gamma-ray luminosity distributions of the data show 
broad peaks, which means that even if there was no in- 
trinsic correlation between radio and gamma-ray emis- 
sion and radio and gamma-ray luminosities were simply 
randomly drawn from these distributions, values of ra- 
dio and gamma-ray luminosity around the peaks would 
appear frequently. As a result, pairs of radio/gamma- 
ray fluxes corresponding to underlying luminosities clus- 
tered around likely values would be common. Such pairs, 
when modulated with a common distance factor, would 



yield an apparent correlation in flux-flux space by chance, 
much more frequently than if there was no peak in the 
luminosity distributions. The unsophisticated signifi- 
cance estimate contains no information about common- 
distance effects and the behavior of the underlying lu- 
minosity distributions, and for this reason overesti- 
mates the significance of the apparent correlation. How- 
ever, even when these effects are accounted for using our 
method, the data show significant intrinsic correlation 
between radio and gamma-ray fluxes. 
We can see that the signiflcance of the correlation 
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monotonically improves with increasing number of bins*. 
The reason for this behavior can be understood from 
Figs, m and [S] The more frequent rejection of pairs at 
the edges of the himinosity distributions resuhs in the 
artificial enhancement of the peaks in the luminosity dis- 
tributions at intermediate luminosities. This stronger 
peak results in an enhanced incidence of artificial cor- 
relations. As a result, the significance of the apparent 
correlation of the data drops. 

This is also the reason for the appearance of a peak at 
positive values of r| in the \r\ distribution of simulated 
datasets in Fig. [7] when the number of redshift bins is 
low. However, it is not guaranteed that a small number 
of redshift bins will generate such a peak at positive \r\ - 
this depends on the details of the luminosity distribution 
of the original dataset and of the pair rejection. For 
example, such a peak does not appear in our smaller 
dataset example in Fig. [2j Conversely, a large number 
of redshift bins does not guarantee that such a peak will 
not appear. If our original dataset is selected in such a 
way that a certain narrow range of luminosities is over- 
represented, then such a peak is intrinsic to the dataset 
and it will appear regardless of number of bins used. 

It is also interesting to consider the behavior of the test 
in the limit of a very large number of redshift bins that 
could be in principle used if we had a very large sample 
available for study, and in the case that our sample was a 
statistically complete, fiux-limited set of sources. In this 
case, each redshift bin could be made very narrow, and all 
sources within the bin would be located essentially at the 
same distance. The set of radio luminosities within each 
bin would then be a representation of the radio luminos- 
ity function at a fixed redshift, with a limiting luminosity 
set by the limiting flux and the bin redshift. The set of 
gamma-ray luminosities within the same bin would sim- 
ilarly be a representation of the gamma-ray luminosity 
function. Since all sources would be located at the same 
distance, data randomization within the bin would never 
produce fluxes outside the original dynamical range, and 
no pairs would be rejected. The simulated pairs would 
then have exactly the same luminosity distribution as 
the data, and they would continue to be a representa- 
tion of the luminosity functions at the two frequencies 
under consideration, as "fair" as the original data. As a 
result, in the limit of the "perfect sample" and a large 
number of redshift bins, our test would yield exactly the 
same result as a statistical test sampling random radio 
and gamma-ray fluxes from known luminosity functions. 
Our test deviates increasingly from this result as the sta- 
tistical properties and the size of the sample deteriorate. 

As shown in Fig. [3 our proposed test is conservative: 
a smaller number of redshift bins will generally result in 
an increased rate of pair rejection and a worse correla- 
tion significance. In this way, it is possible that a real, 
intrinsic correlation cannot be confirmed by this test if 
a poor sample is used. However, the test will not yield 
artificially enhanced significances. 

4.2. Demonstrations on simulated data: 
Uncorrelated datasets 



* The exact statement is that the significance monotonically im- 
proves with decreasing fraction of rejected pairs. Should a particu- 
lar choice in redshift binning result in increased rejection fraction, 
the significance would worsen, even if the number of bins was larger. 



In this section we discuss the performance of the test 
when applied to datasets drawn from intrinsically un- 
correlated populations. In particular, we evaluate the 
effectiveness of our test in rejecting false positives that 
we might have obtained due to common-distance biases 
had we used the estimate of the significance given by 
Eq. In t|4.2.1l we describe how we generate the in- 
trinsically uncorrelated simulated datasets that we use 
to test the performance of our method, and in ^14.2.21 we 
examine this performance and the robustness of the eval- 
uated significances against common-distance biases. 

4.2.1. Generation oj uncorrelated simulated datasets 

To test the performance of our method in the case 
of intrinsically uncorrelated data, we produce simulated 
datasets in the following way. 

• We draw a gamma-ray luminosity from a log- 
normal distribution, with probability density func- 
tion 
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: exp 
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(10) 



• We draw a radio luminosity from a log-normal dis- 
tribution, with probability density function 

\2' 



P{Lr) 
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exp 



(In Lr - f J.2) 

2a| 



(11) 



• For this pair, we also draw a common redshift from 
a uniform distribution with lower limit ziow and 
upper limit Zup- 

• We evaluate the resulting gamma-ray and radio 
fluxes, and check whether they reside within an al- 
lowed flux dynamical range of three orders of mag- 
nitude. If either one does not, we reject the pair 
and repeat the draw. 

• We repeat the process above until we have 30 pairs 
within our desired flux dynamical range. This then 
is our simulated, intrinsically uncorrelated dataset, 
to which both a common distance factor and a limit 
in the flux dynamical range have been applied. 

We anticipate that the effect of the common-distance 
biases will increase as the luminosity dynamical range 
decreases and the redshift dynamical range increases. 
This can be easily understood by considering the extreme 
limits. Datasets drawn from luminosity delta-functions 
will always appear perfectly correlated within errors: the 
spread in fluxes in each waveband is only due to the dis- 
tance factor, which is the same in each pair, and errors. 
Conversely, if all sources are at the same redshift, there 
will be no common-distance effect: the distance factor is 
always the same, and any observed correlation has to be 
intrinsic. 

To assess when common-distance biases become impor- 
tant, we will use the coefficient of variation (eg., Frank 
&; Althoen 1995) of the redshift and luminosity distribu- 
tions (standard deviation in units of the mean, Cz and 
cl respectively) to quantify the dynamical range of each 
distribution. As we will see in the next section, the im- 
portance of common-distance biases is generally depen- 
dent on the ratio of the luminosity coefficient of variation 
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to the redshift coefficient variation, cl/cz, and decreases 
as this ratio increases. In our simulated datasets we have 
used radio and gamma-ray luminosity distributions^ with 
Ml = /i2 = Mo and ai — a2 — uq and, as a result, the 
same value of , but in practice the relevant value of cl 
is the one of the more extended of the two distributions. 
For the distributions we have used here, 
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CL = [exp (cTg) - l] 



1/2 



and 



Zlo 



\/3(Zup + ^low) 



(12) 
(13) 



4.2.2. Robustness of the test against common- distance 

biases 

To evaluate the robustness of our test against common- 
distance biases and its ability to reject false positives, we 
generate, using the procedure described in !j4.2.1[ simu- 
lated datasets with varying values of the ratio cl/cz of 
30 objects each, and we calculate the significance of the 
apparent correlation using our method and the simple 
estimate of Eq. ^ . 

In practice, we implement the simulated dataset gen- 
eration for a specific value of cl/cz in two distinct ways, 
and we compare the results as shown in Fig. |S1 First, we 
keep the redshift distribution fixed to a uniform distribu- 
tion with lower limit ziow — and an upper limit z^p = 2, 
and we draw the radio and gamma-ray luminosities from 
identical distributions with /xq = and a varying value 
of tTo- In this way, we derive the black points in Fig. 
[H Next, we keep the luminosity distributions fixed at 
Mo = 0, ctq = 1: and we draw redshifts from uniform dis- 
tributions with varying upper and lower limits, always 
symmetric about z = I. In this way, we derive the red 
points in Fig. El 

For each dataset, we then evaluate the significance of 
the apparent correlation using the variation of our test 
that utilizes redshift binning; these results are shown 
with the circles/solid lines in Fig. |S1 We compare these 
values with the significance estimate of Eq. ([9|) which 
does not account for any common-distance bias; these re- 
sults are shown with the diamonds/dashed lines in Fig.[8l 
For low values of cl/cz, the simple estimate of Eq. ([9]) 
returns false positives with high significance for these in- 
trinsically uncorrelated datasets. Our method however 
correctly identifies these apparent correlations as arti- 
facts of common-distance biases, and returns a signifi- 
cance value always consistent with no correlation. For 
higher values of c^/cz, common-distance biases are less 
important, and both significance estimates agree, return- 
ing a result consistent with no correlation. 

The roughly consistent, within noise, behavior of the 
black and red lines, despite the different method of im- 
plementation of the same value of cl/cz, implies that the 
cl/cz ratio is a good way to quantify the way in which 
the dynamical ranges in the luminosity and redshift dis- 
tributions induce common-distance biases in correlations 

^ Since the flux/flux correlation coefficient is evaluated in log- 
arithmic space, changing the units of the luminosity, or, equiva- 
lently, the mean of the luminosity distribution, will only uniformly 
slide the points along the flux axes and will not affect the appar- 
ent correlation strength, as long as the flux limits are also shifted 
accordingly. 
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Fig. 8. — Significance (probability to obtain an r as big or bigger 
than the data by chance) returned by our method (circles, solid 
lines) compared to significance returned by the simple estimate of 
Eq. ^ which does not account for common distance biases, as a 
function of the ratio of coefficients of variation of the luminosity 
and redshift distributions. Black points were generated by varying 
the width of the luminosity distribution while keeping the red- 
shift distribution fixed. Red points were generated by varying the 
width of the redshift distribution while keeping the luminosity dis- 
tribution fixed. Our method always succeeds in rejecting artificial 
correlations induced by common-distance biases. 



between different wavebands evaluated in flux space. As 
a rule of thumb, a value of c^/cz smaller than about 
5 indicates that common-distance biases may be impor- 
tant, and the simple estimate of Eq. (j9|) (or, equivalently, 
permutation methods in flux space alone which do not 
account for the common distance modulation in each flux 
pair) should not be trusted as they might yield false pos- 
itives. 

4.3. Demonstrations on simulated data: 
Correlated datasets 

In the previous section we have shown that our pro- 
posed method successfully accounts for common-distance 
biases and returns results consistent with no correlation 
even when the simple estimate of Eq. (j9l) yields very sig- 
nificant false positives. Here we wish to examine whether 
this robustness against false positives comes at the ex- 
pense of the power of the test in rejecting the null hy- 
pothesis of no correlation when the data are intrinsically 
correlated. In i i4.3.1l we discuss how we generate intrinsi- 
cally correlated datasets with minimal common-distance 
biases, and in §4.3.2l we discuss how the power of the test 
depends on the number of objects in the dataset. A'', and 
on the apparent correlation strength, r, as well as how 
these dependencies compare with the simple formula of 
Eq.El 

4.3.1. Generation of correlated datasets 

To generate mock datasets with known intrinsic corre- 
lation signals, we assume that the radio and gamma-ray 
monochromatic luminosities at the frequencies of inter- 
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est are linearly correlated^°, 
a log-normal distribution: 



with some scatter obeying 



log Lr = C + log + A log Lr 



(14) 



where C is a normalization constant, and A logi^ is nor- 
mally distributed with mean and standard deviation a, 
i.e. if A log Lj. — X then the probability density of x is 
given by 



p(x) 



1 



exp 
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(15) 



Using Eq. [5l this yields a relationship between radio and 
gamma-ray flux densities: 



Sr,0 



<J7,0 



hr-1 



X 10 



A log r 



(16) 



The scatter in this intrinsic correlation is quantified 
by cr. We normalize the relation assuming that, for 
z — Alogr — 0, a 15 GHz radio flux density 
of 1 Jy corresponds to a gamma-ray flux density of 
lO-^GeVcm-^s-iGeV"^ at 300 MeV. 

We generate mock datasets by starting from the set 
of 136 sources which (a) are detected by Fermi LAT at 
energies between 300 MeV and 1 GeV and are included 
in the First Fermi Catalog (ILAC, Abdo et al. 2010); 

(b) are included in the OVRO 40 M monitoring sample; 

(c) have known redshifts (see Ackermann et al. 2011 for 
the details of this sample). We use this set to obtain 
redshifts, gamma-ray fluxes, and gamma-ray spectral in- 
dices for our sources; for radio spectral indices, we use the 
historical values quoted in Ackermann et al. 2011. We 
then use Eq. (1161) to obtain radio fluxes with a known 
correlation signal, by using the desired value of a. 

The value of the cl/cz ratio in the sample we use 
is ~ 4, so, according to the findings of ^ 34.2.21 the ef- 
fect of common-distance biases should be limited, and 
any apparent correlation between gamma-ray and radio 
emission should be primarily due to the intrinsic corre- 
lations we have imposed in the simulated datasets. In 
this case, we would expect a well-behaved test to return 
results that are close to the simple significance estimates 
of Eq. dSl). 

Figure [9] shows the distribution of Pearson product- 
moment correlation coefficients r that arise from ran- 
dom realizations of 80 objects obtained in the manner 
described above, for various values of the intrinsic cor- 
relation scatter cr. The striking feature of this plot is 
that the distributions of possible r values of "observed" 
flux/flux correlations arising from different random real- 
izations of the same intrinsic luminosity /luminosity cor- 
relation sampled with the same number of points can be 
quite extended, with its width increasing with increasing 
cr. 

Even if we assumed that we knew the form of the un- 
derlying intrinsic luminosity /luminosity correlation, i.e. 
if, in our case, we assumed that Eq. holds exactly, 
and even with a relatively large sample (80 objects in 
this case), the observed value of the flux/flux correla- 
tion coefficient would only yield a rough and uncertain 

We adopt this assumption in the interest of simplicity for 
these demonstrations; this does not have to be the case in nature. 
NonUnear correlations between the luminosities in the two wave- 
bands will further complicate the relation between intrinsic and 
apparent correlation strength. 




Fig. 9. — Distribution of Pearson product-moment correlation 
coefficients r that arise from random realizations of 80 objects ob- 
tained using 80 randomly chosen Fermi sources from ILAC and 
Eq. 1161 for various values of cr as in legend. 



estimate of the scatter cr of the underlying correlation, 
although the uncertainty of the estimate would improve 
for increasing values of r (decreasing values of cr) . 

4.3.2. Dependence of significance on number of 
observations and apparent correlation strength 

Figure [10] shows the dependence of the calculated sig- 
nificance of a correlation of fixed apparent and intrin- 
sic strength (i.e, fixed values of r and a) on the num- 
ber of objects in the sample. In the example presented 
here, mock datasets of N objects were generated as 
described in ^ 34.3.1) using a fixed intrinsic correlation 
scatter cr — 0.4, and requiring an apparent correlation 
strength ofr = 0.55±0.001. For every value of N plot- 
ted in Fig.lini 10 such mock datasets were produced, and 
for each dataset the significance was evaluated using the 
redshift-bin-splitting variation of the test (with the num- 
ber of redshift bins chosen, for each value of N, as dis- 
cussed in §3.21 ) The datapoints in Fig. [TOl represent the 
the mean of logj^Q (Significance) for these 10 realizations, 
and the error bars indicate the standard deviation of the 
10 values of log^Q (Significance). The solid line shows the 
result of Eq. ^ for r = 0.55 and varying N. Even this 
modest correlation with appreciable scatter can be es- 
tablished at high significance (better than ~ 10~^) with 
60 or more objects. 

FigurefTDshows the dependence of the significance that 
the redshift-bin-splitting variation of our method yields 
as a function of the apparent correlation strength (as 
quantified by r), when the underlying, intrinsic corre- 
lation and the number of objects are fixed. We have 
used an intrinsic correlation with relatively large scat- 
ter (cr = 0.6), sampled with a relatively small number 
of objects {N = 20). As it is obvious from Fig. [HI (red 
line), this large scatter can result in a variety of appar- 
ent correlation values. Again, for each value of r plot- 
ted in Fig. [TTl we have generated 10 mock datasets as 
described in ^14.3.11 demanding that their apparent cor- 
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Fig. 10. — Significance of an intrinsic correlation with a = 0.4 
sampled with objects and resulting in an apparent correlation 
strength of r = 0.55 it 0.001, as a function of N. The points 
indicate the mean and error bars indicate the standard deviation 
of the calculated significances in 10 random implementations of the 
correlation. The downwards triangle indicates an upper limit for 
A'^=100, where the probability of the correlation to arise by chance 
was always found to be < 10~^ (none out of lO'^ scrambled datasets 
had an \r\ at least as big as the "data"). The solid line shows the 
result of Eq. JgJ for r = 0.55 and varying A^. 
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Fig. 11. — Significance of an intrinsic correlation with a = 0.6 
sampled with 20 objects and resulting in an apparent correlation 
strength of varying r, as a function of r. The points indicate the 
mean and error bars indicate the Ic variation of the calculated 
significances in 10 random implementations of the correlation. The 
solid line shows the result of Eq. ^ for A'^ = 20 and varying r. 



relation strength is within 0.001 of the plotted r value. 
For each dataset the significance was evaluated using the 
redshift-bin-splitting variation of the test (with the num- 
ber of redshift bins chosen, for each value of A^, as dis- 
cussed in The datapoints in Fig. [TT] represent the 
the mean of log]^o(Sigi^ificance) for these 10 realizations, 
and the error bars indicate the standard deviation of the 



10 values of logj^Q (Significance). The solid line shows the 
result of Eq. © for = 20 and varying r. 

As we can see in both Fig. [TU] and Fig. [TTl the signif- 
icance returned by our method in the absence of strong 
common-distance biases and for correlated datasets is 
consistent with, or only marginally worse than the results 
of the simple estimate of Eq. ©. We therefore conclude 
that the robustness of our method against common- 
distance biases in uncorrelated datasets does not come 
at the expense of the power of the method in the case of 
correlated datasets. 

5. DISCUSSION AND CONCLUSIONS 

In this paper, we have introduced a data- 
randomization method for assessing the significance of 
apparent correlations between radio and gamma-ray 
emission in blazar jets, accounting explicitly for biases 
introduced through a common distance, small sample 
size, and complex or subjective sample selection criteria. 
Our method is designed to be conservative and appli- 
cable even to small samples selected with subjective 
criteria. 

An application of this technique to the first Fermi cat- 
alog of point sources (Abdo et al. 2010) has been dis- 
cussed in Ackermann et al. (2011), which also discusses 
the dependency of the strength and significance of the 
radio/gamma flux correlation on gamma-ray photon en- 
ergy, and on the concurrency of the two datasets. A 
study of the dependency of correlation strength and sig- 
nificance on radio frequency can be assessed by applica- 
tion of this method to multi-frequency radio monitoring 
data, such as the results of the F-GAMMA Program (An- 
gelakis et al. 2010; Fuhrmann et al. 2012, in preparation). 

Using simulated datasets of intrinsically uncorrelated 
data, we have demonstrated that our proposed method is 
robust against artificial correlations induced by common- 
distance biases, and returns results consistent with no 
correlation even when simple face-value significance es- 
timates that do not account for these biases would have 
incorrectly claimed highly significant correlations. We 
have shown that the effect of these biases can be quan- 
tified by the ratio of the coefficients of variation of the 
luminosity distribution (in the waveband which has the 
widest luminosity distribution) over the redshift distri- 
bution. When this ratio is lower than ^ 5 false positives 
are possible when the biases are not accounted for, with 
their significance increasing with decreasing value of the 
ratio. In addition, using simulated datasets of intrinsi- 
cally correlated data, we have shown that our method 
can establish existing correlations with significance com- 
parable to that of other tests, and thus its robustness 
against false positives does not come at the expense of 
its power in rejecting the null hypothesis. 

As our method is designed to be applied to astronom- 
ical datasets, we have implemented it in such a way to 
directly address the limited flux dynamical range that 
is generally encountered in such data. Astronomical 
datasets are generally expected to have a low-fiux limit 
in each frequency due to the limited sensitivity of any 
given observing instrument, and a high-flux limit corre- 
sponding to the most favorable combination of luminos- 
ity/distance that happened to occur given our position 
in the universe, which is generally determined by chance, 
and fixed by the observed dataset. Simulated data with 
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one or two of the fluxes outside these Units represent 
situations possible in nature but impossible to observe 
in the specific experiment. Such simulated data coud 
result in apparent correlations in our simulated samples 
induced by a single very bright or very faint object much 
brighter or fainter than the objects in our actual sam- 
ple, while most of the other pairs would be scattered in 
a limited area of the flux/flux space (the classical case of 
an artificial correlation seen when an uncorrelated scat- 
tered cluster of points is combined with a single point far 
away from the main cluster) . Such configurations would 
be impossible in our actual observed datasets, but could 
be frequent in our simulated datasets, thus biasing the 
simulated datasets toward much higher correlation coef- 
ficients, and artificially reducing the significance of any 
correlation seen in the observed datasets. We exercise 
care to avoid this bias by limiting the fiux dynamical 
range of our randomized data to that of the observed 
sample and by rejecting simulated flux pairs outside this 
range. 

We caution the reader that our test does not in any way 
account for the effects of non-simultaneity. Ideally the 
test should be applied to data in different frequencies av- 
eraged over the same time interval. The spectral indices 
used in each band to implement the K-correction should 
also be concurrently measured with the flux averages. 
Such concurrent spectral indices are straight-forward to 
obtain in gamma rays, however radio monitoring is rou- 
tinely performed at a single waveband (as is the case 
for the OVRO 40 M Monitoring Program), and in prac- 
tice only archival radio spectral indices are available. It 
is thus fortunate that our test is robust against small 
changes in the value of the radio spectral index used in 
the K-correction. This property of the test can be under- 
stood by taking into account that blazars are spectrally 
flat at radio frequencies so the effect of the K-correction 
in radio is small to begin with. We have confirmed this 
by alternatively using archival radio spectral indices mea- 
sured for each source, or a uniform value of — —0.5 
across all sources; the evaluated significance in the two 
cases did not change appreciably (fractional change in 
the quoted significance less than 0.01). This result can 
be explicitly confirmed by using multi-frequency simulta- 
neous data from the F-GAMMA Program where simul- 
taneous radio spectral indices can be obtained. These 
tests are described in detail in Fuhrmann et al. 2012 (in 
preparation) . 
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In contrast, the test is quite sensitive to the redshift 
of the sources included in the sample under considera- 
tion, as shown in Ackermann et al. (2011): as the main 
purpose of the test is to assess the effect of distance bi- 
ases, the calculations involved are sensitively dependent 
on said distances. For this reason the test should only be 
used on samples with known redshifts for all members. 

Finally, we stress that this test in itself does not assess 
the strength of the intrinsic correlation between flux den- 
sities of different frequencies. It only addresses its sta- 
tistical significance, i.e. the probability that an apparent 
correlation as strong or stronger than the observed one 
can be obtained from intrinsically uncorrelated data due 
to observational biases. A correlation may be very weak 
but picked up at high significance if the dataset is large 
and the data quality is high; in contrast, a strong corre- 
lation in a very small sample may not be very significant. 
It is similarly important when the test returns a low sta- 
tistical significance to carefully distinguish between lack 
of evidence for correlation and evidence for intrinsically 
uncorrelated data. Intrinsic lack of correlation cannot 
generally be established. However it is possible to show 
that a correlation with strength above a certain threshold 
would have been picked up at a given level of significance 
by a particular test. 
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